4 (a) selecting a frame cluster in said input video sequence which 

5 corresponds to a most static one of said video segments; 

6 (b) computing a content value in said selected frame cluster; 

7 (c) using said computed content value to cluster remaining frames 

8 in said input video sequence. 

1 2. The method of claim 1, wherein in said (a) said frame cluster is 

2 selected using a refined feature space representation of said input video sequence. 

1 3. The method of claim 1, wherein in said (a) each of said plurality of 

2 frames is transformed into a histogram vector indicative of a spatial distribution of 

3 colors in said each of said plurality of frames. 

1 4. The metiiod of claim 3, wherein in said (a) each of said plurality of 

2 frames is divided into a plurality of blocks, each of said plurality of blocks being 

3 represented by a histogram in a color space indicative of a distribution of colors 

4 within each of said plurality of blocks. 

1 5. The method of claim 3, wherein each of said plurality of frames is 

2 divided into a pluraHty of blocks and each said histogram vector comprises a plurality 

3 of histograms in a color space, each of said plurality of histograms corresponding to 

4 one ofsaid plurality of blocks. 

1 6. The method of claim 2, wherein said refined feature space 

2 representation is obtained using a singular value decomposition of said input video 

3 sequence. 

1 7. The method of claim 6, wherein said singular value decomposition is 

2 performed usmg fi^es selected with a fixed interval from said mput video sequence. 

1 8. The method of claim 7, wherein said selected firames are arranged into 

2 a feature frame matrix, and wherein said singular value decomposition is performed 

3 on said feature frame matrix. 
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1 9. The method of claim 6, wherein said singular value decomposition 

2 produces a matrix, each column of said matrix representing a frame in a refined 

3 feature space corresponding to a frame in said input video sequence. 



1 10. The method of claim 1, further comprising (d) using said clustered 

2 frames to output a motion video representative of a suirmiary of said input video 

3 sequence. 

1 11. The method of claim 1 , further comprising (d) outputting a plurality of 

2 keyframes, each of said plurality of keyframes representative of said clustered frames. 

1 12. The method of claim 2, wherein said selecting comprises locating a 

2 cluster closest to an origin of said refined feature space. 

1 13. The method of claim 2, wherein said (c) comprises: 

2 (c)(1) sorting a plurality of vectors in said refined feature space in 

3 ascending order according to a distance of each of said vectors 

4 to an origin of said refined feature space rq)resentation; 

5 (c)(2) selecting a vector among said sorted vectors which is closest to 

6 an origin of said refined feature space representation and 

7 including said selected vector into a first cluster; 

8 (c)(3) clustering said plurality of sorted vectors in said refined feature 

9 into a plurality of clusters according to a distance between each 

10 of said plurality of sorted vectors and vectors in each of said 

11 plurality of clusters and an amount of information in each of 

12 said plurality of clusters. 

1 14. The method of claim 13, wherein in said (c)(3) said plurality of sorted 

2 vectors are clustered into said plurality of clusters such that said amount of 

3 information in each of said plurality of clusters does not exceed an amount of 

4 information in said first cluster. 
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15. The method of claim 13, wherein said first cluster is composed of 
fi'ames based on a distance variation between said fi'ames and an average distance 
between frames in said first cluster. 



1 16. The method of claim 13, wherein each of said plurality of clusters is 

2 composed of fi-ames based on a distance variation between said frames and an 

3 average distance between fi-ames in said each of said plurahty of clusters. 

1 17, A method for summarizing a content of an input video sequence, said 

2 method comprising: 

3 (a) selecting fi-ames from said input video sequence, said selected 

4 firames being taken at a fixed interval; 

5 (b) creating a feature fi-ame matrix using said selected firames; 

6 (c) performing a singular value decomposition on said feature 

7 frame matrix to obtain a matrix representing said video 

8 sequence in a refined feature space; 

9 (d) selecting a cluster in said refined featxire space corresponding 

10 to a most static video segment; 

11 (e) computing a content value corresponding to said selected 

12 cluster; 

13 (f) using said computed content value to cluster fi-ames in said 

14 input video sequence. 

1 18. A method for segmenting an input video sequence, said input video 

2 sequence comprising a plurality of frames, said plurality of frames being grouped 

3 into a plurality of video shots, said method comprising: 

4 (a) computing a similarity between each of said plurality of frames 

5 and a frame preceding said each of said plurality of frames in 

6 time; 

7 (b) segmenting said input video sequence into said plurality of 

8 video shots according to said computed similarity. 
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1 19. The method of claim 18, wherein said similarity is calculated using a 

2 refined feature space rq)resentation of said input video sequence. 



1 20. The method of claim 19, wherein said refined feature space 

2 representation is created using a singular value decomposition of said input video 

3 sequence. 

1 21. The method of claim 20, wherein said singular value decomposition is 

2 performed using frames selected with a fixed interval from said input video sequence. 

1 22. The method of claim 21, wherein said selected frames are arranged 

2 into a feature frame matrix, and wherein said singular value decomposition is 

3 performed on said feature frame matrix. 

1 23. The method of claim 22, wherein said performed singular value 

2 decomposition produces a matrix, each column of said produced matrix comprising a 

3 Sczme in said refined feature space representing a frame in said input video sequence. 

1 24. The method of claim 18, fiirther comprising (c) extracting features 

2 from each of said plurality of video shots. 

3 25. A method for determining a similarity between a first and a second 

4 firame in an input video sequence, said method comprising: 

5 (a) calculating a refined feature space representation of said mput 

6 video sequence; 

7 (b) using said calculated representation to compute said similarity 

8 between said first and said second frames. 

9 26. The method of claim 25, wherein in said (a) said refined feature space 
10 representation is calculated using a singular value decomposition. 
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1 27. The method of claim 18, wherein in said (b) said computed similarity 

2 is compared to at least a first threshold similarity and a second threshold similarity, 

3 and said input video sequence is segmented according to a result of said comparison. 

1 28. The method of claun 18, wherein if in said (b) said computed 

2 similarity is below a first threshold similarity, said each of said plurality of frames is 

3 put into a one of said plurality of video shots containing said precedent in time frame. 

4 29. The method of claim 18, wherein if in said (b) said computed 

5 similarity is above a second threshold similarity, said each of said plurality of frames 

6 is designated as a shot boundary. 

7 30. The method of claim 18, wherein if in said (b) said computed 



8 similarity is between a first threshold similarity and a second threshold similarity, 

9 said each of said plurality of frames is put into a one of said plurality of video shots 

10 according to a fiirfher analysis performed using additional frames from said plurality 

1 1 of frames. 



1 3 1 . A computer-readable medium containing a program for summarizing a 

2 content of an input video sequence, said input video sequence comprising a plurality 

3 of frames, said plurality of frames being grouped into a plurality of video segments, 

4 said program comprising: 

5 (a) selecting a frame cluster in said input video sequence which 

6 corresponds to a most static video segment; 

7 (b) computing content value in said selected frame cluster; 

8 (c) using said computed content value to cluster remaining fimnes 

9 in said input video sequence. 

1 32. The computer-readable medium of claim 31, wherein in said (a) said 

2 frame cluster is selected using a refined feature space representation of said input 

3 video sequence. 
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33. The computer-readable medium of claim 31, wherein in said (a) each 
of said plurality of frames is transformed into a histogram vector indicative of a 
spatial distribution of colors in said each of said plurality of frames. 

34. The computer-readable medium of claim 33, wherein in said (a) each 
of said plurality of frames is divided into a plurality of blocks, each of said plurality 
of blocks being represented by a histogram in a color space indicative of a 
distribution of colors within each of said plurality of blocks. 

35. The computer-readable medium of claim 33, wherein each of said 
plurality of frames is divided into a plurality of blocks and each said histogram vector 
comprises a plurality of histograms in a color space, each of said plurality of 
histograms corresponding to one of said pluraUty of blocks. 

36. The computer-readable medium of claim 32, wherein said refined 
feature space representation is obtained using a singular value decomposition of said 
input video sequence. 

37. The computer-readable medium of claim 36, wherein said singular 
value decomposition is performed using frames selected with a fixed interval from 
said input video sequence. 

38. The computer-readable medium of claim 37, wherein said selected 
frames are arranged into a feature frame matrix, and wherein said singular value 
decomposition is performed on said feature frame matrix. 

39. The computer-readable medium of claim 33, wherein said singular 
value decomposition produces a matrix, each column of said matrix representing a 
frame in a refined feature space corresponding to a frame in said input video 
sequence. 
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1 40. The computer-readable medium of claim 31, furthw comprising (d) 

2 using said clustered frames to output a video representative of a summary of said 

3 input video sequence. 

1 41. The computer-readable medium of claim 31, further comprising (d) 

2 outputting a plurality of keyframes, each of said plurality of keyframes representative 

3 of said clustered frames. 

1 42. The computer-readable medium of claim 32, wherein said selecting 

2 comprises locating a cluster closest to an origin of said refined feature space. 

1 43. The computer-readable medium of claim 32, wherein said (c) 

2 comprises: 

3 (1) sorting a plurality of vectors in said refined feature space in 

4 ascending order according to a distance of each of said vectors 

5 to an origin of said refined feature space; 

6 (2) selecting a vector among said sorted vectors which is closest to 

7 an origin of said refined feature space and including said 

8 selected vector into a first cluster; 

9 (3) clustering said plurality of sorted vectors in said refined feature 

10 into a plurality of clusters according to a distance between each 

1 1 of said plurality of sorted vectors and each of said plurality of 

12 clusters and an amount of information in each of said plurality 

13 of clusters. 

1 44. The computer-readable medium of claim 38, wherein in said (3) said 

2 plurality of sorted vectors are clustered into said plurality of clusters such that said 

3 amount of information in each of said plurality of clusters does not exceed an amount 

4 of information in said first cluster. 
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45. The computer-readable medium of claim 38, wherein said first cluster 
is composed of frames based on a distance variation between said frames and said 
first cluster, 

46. The computer-readable medium of claim 38, wherein each of said 
plurality of clusters is composed of frames based on a distance variation between said 
frames and said each of said plurality of clusters. 

47. A computer-readable medium containing a program for summarizing a 
content of an input video sequence, said program comprising: 

(a) selecting frames with a fixed interval &om said input video 
sequence; 

(b) creating a feature fi:ame matrix using said selected fi^es; 

(c) performing a singular value decomposition on said feature 
frame matrix to obtain matrix representing said video sequence 
in refined feature space; 

(d) selecting a cluster in said refined feature space corresponding 
to a most static video segment; 

(e) computing a content value corresponding to said selected 
cluster; 

(f) using said computed content value to cluster fi^es in said 
input video sequence. 

48. A computer-readable medium containing a program for segmenting an 
input video sequence, said input video sequence comprismg a pluiahty of frames, said 
plurality of frames being grouped into a plurality of video shots, said program 
comprising: 

(a) computing a similarity between each of said plurality of frames 
and a subsequent in time fi^e; 

(b) segmenting said input video sequence into a plurality of shots 
according to said computed similarity. 
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1 49. The computer-readable medium of claim 18, wherein said similarity is 

2 calculated using a refined feature space representation of said input video sequence. 

1 50. The computer-readable medium of claim 19, wherein said refined 

2 feature space representation is created using a singular value decomposition of said 

3 input video sequence. 

1 51. The computer-readable medium of claim 20, wherein said singular 

2 value decomposition is performed using fi-ames selected with a fixed interval fi'om 

3 said input video sequence. 

1 52. The computer-readable medium of claim 21, wherein said selected 

2 firames are arranged into a featxire fi-ame matrix, and wherein said singular value 

3 decomposition is performed on said feature frame matrix. 

1 53. The computer-readable medium of claim 22, wherein said performed 

2 singular value decomposition produces a matrix, each colunm of said produced 

3 matrix comprising a firame in said refined feature space representing a firame in said 

4 input video sequence. 

1 54. The computer-readable medium of claim 18, wherein said program 

2 fiirther comprises (c) extracting features firom each of said plurality of video shots. 

3 55. A computer-readable medium containing a program for determining a 

4 similarity between a first and a second fi-ames in an input video sequence, said 

5 program comprising: 

6 (a) calculating a refined feature space representation of said input video 

7 sequence; and 

8 (b) using said calculated representation to compute said similarity between 

9 said first and said second frames. 
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10 56. The computer-readable medium of claim 25, wherein in said (a) said 

11 refined feature space representation is calculated using a singular value 

12 decomposition. 

1 57. The computer-readable medium of claim 18, wherein in said (b) said 

2 computed similarity is compared to at least two threshold similarities, and said input 

3 video sequence is segmented according to a result of said comparison. 

1 58. The computer-readable medium of claim 48, wherein if in said (b) said 

2 computed similarity is below a first threshold similarity, said each of said plurality of 

3 frames is put into a one of said plurality of video shots containing said precedent in 

4 time frame. 

5 59. The computer-readable medium of claim 48, wherein if in said (b) said 

6 computed similarity is above a second threshold similarity, said each of said plurality 

7 of frames is designated as a shot boundary. 

8 60. The computer-readable medium of claim 48, wherein if in said (b) said 

9 computed similarity is between a first threshold similarity and a second threshold 

10 similarity, said each of said plurality of frames is put into a one of said plurality of 

1 1 video shots according to a frirther analysis performed using additional frames from 

1 2 said plurality of frames. 

1 61. The method of claim 18, further comprising (c) extracting features 

2 from each of said plurality of video shots and using said extracted features to index 

3 said plurality of video shots. 

1 62. The method of claim 6 1 , wherein said extracted features are features of 

2 a video frame representative of said each of said plurality of video shots. 

1 63. The computer-readable medium of claim 48, wherein said program 

2 further comprises (c) extracting features from each of said plurality of video shots and 

3 using said extracted features to index said plurality of video shots. 
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1 64. The method of claim 63, wherein said extracted features are features of 

2 a video frame representative of said each of said plurality of video shots. 

1 65. A method of calculating a degree of visual changes in a video shot, 

2 said video shot comprising a plurality of frames, said method comprising: 

3 (a) performing a singular value decomposition on said plurality of frames, 

4 wherein said singular value decomposition produces a matrix, each 

5 colunm of said matrix representing a frame in a refined feature space 

6 corresponding to a frame in said plurality of frames; 

7 (b) using said matrix to calculate said degree of visual changes in said 

8 video shot. 

1 66. The method of claim 65, wherein said (b) comprises calculating said 

IrankCA) ^ 

2 degree of visual changes in said video shot as a sum I ^ y.. , wherein v,y are 

3 elements of said matrix. 

1 67. A computer-readable medium containing a program for calculating a 

2 degree of visual changes in a video shot, said video shot comprising a pluraUty of 

3 fitimes, said program comprising: 

4 (a) performing a singular value decomposition on said plurality of frames, 

5 wherein said singular value decomposition produces a matrix, each 

6 colunm of said matrix representing a frame in a refined feature space 

7 corresponding to a frame in said plurality of frames; 

8 (b) using said matrix to calculate said degree of visual changes in said 

9 video shot. 

1 68. The computer-readable medium of claim 67, wherein said (b) 

2 comprises calculating said degree of visual changes in said video shot as a sum 

3 ^1 X Vi > wherein Vjj are elements of said matrix. 
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1 69. A method of calculating an evenness of color distributions in a video 

2 shot, said video shot comprising a plurality of frames, said method comprising: 

3 (a) performing a singular value decomposition on said plurality of frames, 

4 wherein said singular value decomposition produces a matrix, each 

5 column of said matrix representing a frame in a refined feature space 

6 corresponding to a frame in said plurality of frames; 

7 (b) using said matrix to calculate said evenness of color distribution in 

8 said video shot. 

1 70. The method of claim 69, wherein said (b) comprises calculating said 

X! CJj Vij ' wherein 

3 said Vij are elements of said matrix and said ay are singular values obtained in said 

4 singular value decomposition. 

1 71 . A computer-readable medium containing a program for calculating an 

2 evenness of color distributions in a video shot, said video shot comprising a plurality 

3 of frames, said method comprising: 

4 (a) performing a singular value decomposition on said plurality of frames, 

5 wherein said singular value decomposition produces a matrix, each 

6 colimm of said matrix representing a fi'ame in a refined feature space 

7 corresponding to a frame in said plurality of frames; 

8 (b) using said matrix to calculate said evenness of color distribution in 

9 said video shot. 

1 72. The computer readable mediimi of claim 71, wherein said (b) 

2 comprises calculating said evenness of color distribution in said video shot as a sum 

CJj Vij » wherein said V,y are elements of said matrix and said O/ are 

4 singular values obtained in said singular value decomposition. 
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